Superword Processors 22 June 2001
ثبت نشده
چکیده
This paper introduces the concept of a superword processor, a style of computer-architecture design in which a traditional processor datapath is replicated in SIMD fashion, transforming each machine instruction into a SIMD equivalent. Superword techniques are appealing because they require minimal changes to an existing design, offer backward compatibility with an existing ISA, and offer large potential for performance improvements. This paper also describes new compiler techniques that are key to the successful use of a superword processor. Automatic parallelization of general-purpose, sequential applications is needed in order to provide a seamless interface to the user. One of the most important tasks of the compiler is the grouping of multiple memory operations into a single wide load or store. We will show that this can greatly increase memory bandwidth. We will show that a superword-extended VLIW machine can outperform a clustered VLIW with similar resources by a factor of 2.43. Even compared with an ideal VLIW machine we achieve speed-ups of 1.84.
منابع مشابه
Application of a superword array in genome assembly
We introduce a data structure called a superword array for finding quickly matches between DNA sequences. The superword array possesses some desirable features of the lookup table and suffix array. We describe simple algorithms for constructing and using a superword array to find pairs of sequences that share a unique superword. The algorithms are implemented in a genome assembly program called...
متن کاملExploiting Superword-Level Locality in Multimedia Extension Architectures
In this paper, we describe an algorithm and implementation of locality optimizations for architectures with instruction sets such as Intel’s SSE and Motorola’s AltiVec that support operations on superwords, i.e., aggregate objects consisting of several machine words. We treat the large superword register file as a compiler-controlled cache, thus avoiding unnecessary memory accesses by exploitin...
متن کاملEvaluation of Architectural Paradigms for Addressing the Processor-Memory Gap
Many high performance applications run well below the peak arithmetic performance of the underlying machine, with inefficiencies often attributed to poor memory system behavior. In the context of scientific computing we examine three emerging processors designed to address the wellknown gap between processor and memory performance through the exploitation of data parallelism. The VIRAM architec...
متن کاملJohn von Neumann Institute for Computing Pack Transposition: Enhancing Superword Level Parallelism Exploitation
c © 2006 by John von Neumann Institute for Computing Permission to make digital or hard copies of portions of this work for personal or classroom use is granted provided that the copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise requires prior specific permission by the publisher ment...
متن کاملEvaluating compiler technology for control-flow optimizations for multimedia extension architectures
This paper addresses how to automatically generate code for multimedia extension architectures in the presence of conditionals. We evaluate the costs and benefits of exploiting branches on the aggregate condition codes associated with the fields of a superword (an aggregate object larger than a machine word) such as the branchon-any instruction of the AltiVec. Branch-on-superword-conditioncodes...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2001